Discourse Segmentation of German Written Texts
نویسندگان
چکیده
Discourse segmentation is the division of a text into minimal discourse segments, which form the leaves in the trees that are used to represent discourse structures. A definition of elementary discourse segments in German is provided by adapting widely used segmentation principles for English minimal units, while considering punctuation, morphology, sytax, and aspects of the logical document structure of a complex text type, namely scientific articles. The algorithm and implementation of a discourse segmenter based on these principles is presented, as well an evaluation of test runs.
منابع مشابه
Discourse Segmentation of German Texts
This paper addresses the problem of segmenting German texts into minimal discourse units, as they are needed, for example, in RST-based discourse parsing. We discuss relevant variants of the problem, introduce the design of our annotation guidelines, and provide the results of an extensive interannotator agreement study of the corpus. Afterwards, we report on our experiments with three automati...
متن کاملSubtopic annotation and automatic segmentation for news texts in Brazilian Portuguese
Subtopic segmentation aims to break documents into subtopical text passages, which develop a main topic in a text. Being capable of automatically detecting subtopics is very useful for several Natural Language Processing applications. For instance, in automatic summarisation, having the subtopics at hand enables the production of summaries with good subtopic coverage. Given the usefulness of su...
متن کاملCoreference in Spoken vs. Written Texts: a Corpus-based Analysis
This paper describes an empirical study of coreference in spoken vs. written text. We focus on the comparison of two particular text types, interviews and popular science texts, as instances of spoken and written texts since they display quite different discourse structures. We believe in fact, that the correlation of difficulties in coreference resolution and varying discourse structures requi...
متن کاملA Rule Based Approach to Discourse Parsing
In this paper we present an overview of recent developments in discourse theory and parsing under the Linguistic Discourse Model (LDM) framework, a semantic theory of discourse structure. We give a novel approach to the problem of discourse segmentation based on discourse semantics and sketch a limited but robust approach to symbolic discourse parsing based on syntactic, semantic and lexical ru...
متن کاملTense, Modality and Polarity: The Finite Verbal Group in English and German Newsgroup Texts
This paper describes work in progress on a corpus-based study, comparing seemingly similar registers in two languages: English and German newsgroup texts, collected in the Bremen Translation Corpus. Systemic Functional Grammar (SFG, Halliday 1994 [1985]) provides a theoretical framework for categorizing empirical findings. I will focus on three systems of the finite verbal group, i.e. tense, mo...
متن کامل